Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add BaseSearch and RandomSearch #25

Merged
merged 1 commit into from
Sep 13, 2017

Conversation

SioKCronin
Copy link
Collaborator

@SioKCronin SioKCronin commented Sep 10, 2017

This commit adds a SearchBase class for GridSearch and RandomSearch,
which defines an init that sets all attributes for instances
as well as score and search methods.

This commit also adds RandomSearch, which provides search for optimal
performance on objective function over combinations of hyperparameter
values with specified bounds for specified number of combination
iterations.

Author: SioKCronin
Email: siobhankcronin@gmail.com

@SioKCronin
Copy link
Collaborator Author

I'm curious to hear your thoughts on this, and am totally open to reorganize/iterate. 🔧

I imagine the structure will depend on how this will be used. I suppose having GridSearch and RandomSearch as separate methods might feel a bit clunky if a user doesn't really have a preference and just wants the best parameters. Maybe in that case there is simply a search method that defaults to GridSearch, and they can toggle to RandomSearch with bounds and iterations if they want a little more control. In general, I like that users can set a low number of iterations for RandomSearch in situations where an exhaustive search of GridSearch might not be practical.

@ljvmiranda921
Copy link
Owner

Got it! Thanks @SioKCronin! 😃
Okay, will review this by the end of the week!

@ljvmiranda921 ljvmiranda921 self-requested a review September 11, 2017 04:14
@ljvmiranda921 ljvmiranda921 added the enhancement Feature requests label Sep 11, 2017
Copy link
Owner

@ljvmiranda921 ljvmiranda921 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hey @SioKCronin, nice work! Thanks a lot!

It's not yet the end of the week but I got some unexpected free time so I'm now writing this review 😸

As for the structure of the search classes, I believe having separate classes for RandomSearch and GridSearch is already fine. Your implementation is good 👍. I'm more for having classes that are as atomic as possible. In this manner, it's easier to construct higher-level methods that combines these atoms in whatever way we please.

This ties in to the use-case where someone doesn't really have a preference on which search method to use. A naive suggestion would be to expound in the documentation the benefits of both, and the necessary tradeoffs in using one over the other. Just to make the user conscious of their choice.

In the same manner, we can write a higher-level method that just calls GridSearch and a simple toggle for random search. Although I don't know where we can put this higher-level method.

Lastly, for the previous commit, 48a279f, I added support for Python 2.7. Usually we add something like this at the top-level

# Import from __future__
from __future__ import with_statement
from __future__ import absolute_import
from __future__ import print_function

from past.builtins import xrange

This enables the use of print methods and parenthesis when working with Python 2. The last import in xrange enables you to use Python 2's xrange in Python 3.

This seems a bit confusing but in Python 2, the range method gives you a list. But in Python 3, you are given an iterator.

# In Python 2

type(range(5)) # range gives you a list
>>> <type 'list'>

type(xrange(5)) # xrange gives you an iterator
>>> <type 'xrange'>

# In Python 3

type(range(5)) # range gives you an iterator
>>> <type 'range'>

type(xrange(5)) # xrange is not supported
>>> NameError: name 'xrange' is not defined

Iterating through an iterator is much faster than iterating through a list. On the other hand, the xrange method in Python 2 gives you an iterator. Python 3 doesn't support xrange unless we import it from past.builtins. This means that we have to change all range methods into xrange so that it outputs an iterator even if we use Python 2 or 3.

# Import xrange from past.builtins
from past.builtins import xrange

# In Python 2

type(xrange(5)) # xrange gives you an iterator
>>> <type 'xrange'>

# In Python 3
type(xrange(5)) # xrange gives you an iterator
>>> <type 'range'>

All in all, thank you so much for your contributions! It's really awesome 👍


Attributes
----------
optimizer: PySwarms class
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We can write optimizer: PySwarms class in a modular format: optimizer : pyswarms.single . This already defines both pyswarms.single.LocalBestPSO and pyswarms.single.GlobalBestPSO 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it ✅

self.iters = iters

def generate_score(self, options):
"""Generates score for optimizer's performance on objective function."""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Even if this is not a user-facing method, maybe we can add docstrings here? Just for others who would look at the code. 👍

"""Generates score for optimizer's performance on objective function.

Parameters
-------------

options : dict
    <my description>
"""

The same goes for other methods 😄

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Got it ✅

idx, self.best_score = max(enumerate(scores), key=op.itemgetter(1))
else:
idx, self.best_score = min(enumerate(scores), key=op.itemgetter(1))

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it is fine even if we remove the else construct because the min serves as the
default behaviour. Can we confirm this based on your unit tests?

# Catches that maximum bool flag
 if maximum:
    idx, self.best_score = max(enumerate(scores), key=op.itemgetter(1))

# Default behaviour
idx, self.best_score = min(enumerate(scores), key=op.itemgetter(1))

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Cool. I set this up to first run the default assignments, and then reassign if the maximum bool flag is caught (just flipping the order of what you show here). 🔄


class GridSearch(object):
class GridSearch(SearchBase):
"""Exhaustive search of optimal performance on selected objective function
over all combinations of specified hyperparameter values."""
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hmmm... I may be wrong or my GitHub is not working properly, but does this need to have an __init__ method similar to RandomSearch()? 😕 😄

You also mentioned about the possibility that GridSearch will take so long when we're looking for a lot of hyperparameters and take too many iterations 👍 . Maybe we can add a logger.warn? check the implementations in GlobalBestPSO and LocalBestPSO.

There is a logger.info('My information') call in place of print() functions. I also initialized it with logger = logging.getlogger(__name__). You can do the same thing but call logger.warn() instead.

Thus we can have something like:

if condition_is_met:
    logger.warn('Using GridSearch will take so much time') 
# of course you can write a better warning than mine hehe

This will still run a search, only with an additional warning.

Copy link
Collaborator Author

@SioKCronin SioKCronin Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ha! Yes, I think I thought I was being clever because if no new attributes are added in the GridSearch __init__ it could just just inherit SearchBase's __init__, which would be retrieved when creating a GridSearch instance. But looking at it now, I think it's clearer to call the super().__init__ explicitly.

+  57     def __init__(self, optimizer, n_particles, dimensions, options,
+  58                  objective_func, iters,bounds=None, velocity_clamp=None):
+  59         """Initializes the paramsearch."""
+  60
+  61         # Assign attributes
+  62         super().__init__(optimizer, n_particles, dimensions, options,
+  63                 objective_func, iters, bounds=bounds,
+  64                 velocity_clamp=velocity_clamp)

Copy link
Collaborator Author

@SioKCronin SioKCronin Sep 12, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for pointing me to 'logger.warning'. In looking at search I realized I'm not sure how much of an issue runtime will be, even for GridSearch with lots of hyperparameter options. My hunch was that runtime on big search spaces would need to be flagged for users, but I don't have a good sense of when a user would input a really long list for each of the 'c' weights, for instance.

What I'd like to do next is write up an example for the docs on Tuning PSO hyperparameters that shows these two methods in action, and use that writing as a chance to read the literature on conventions for setting PSO hyperparameters. I think this will help me have a better sense of kinds of situations might come up. 📚

What I can see happening is including logger.warn, but also having clear documentation that speaks to the conventions for setting options (and perhaps resources linked in the Example) that would steer users away from setting search values that don't quite make sense, and might take too long to run.

Copy link
Owner

@ljvmiranda921 ljvmiranda921 Sep 13, 2017

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actually, I realized that it's kinda hard to pinpoint when to issue a warning based on user input. I believe your plan is good for now 👍

Regarding the __init__, the reason why I often add the super() is that it enables me to pass defaults into the base class. So just in case I want defaults in GridSearch etc., I just declare them on GridSearch's init, then do the super. But for now i guess this implementation is fine

@SioKCronin
Copy link
Collaborator Author

@ljvmiranda921 Your reviews are the best!! 🥇 Thanks for the clear suggestions. I'll have time tomorrow morning to make these changes and will respond fully to the questions you flagged. More soon! 😄

@ljvmiranda921
Copy link
Owner

Hey @SioKCronin,

Good job, thanks a lot for your efforts! We are supposed to merge already but I apologize for forgetting to mention one thing: have you added an __init__.py file in pyswarms/tests/utils/search/? It seems that the tests in travis doesn't include the ones you've written.

Just add an __init__.py file in the said directory so that the interpreter can recognize it. You can run the tests first in your local machine before committing or pushing. Just go to the top level directory, i.e., outside the pyswarms/ folder where you can see both pyswarms/ and tests/ then run:

$ python -m unittest tests.utils.search.test_gridsearch

or

$ python -m unittest tests.utils.search.test_randomsearch

In Travis-CI, I just call python -m unittest discover, which looks for all files and methods that have a pattern test_<anything>. If all of the tests pass, then it's all good!

Once everything is clear, we can merge right away.

If you wish to add examples and more documentation, just issue another Pull Request 👍

Thank you so much and I hope everything's good over there!

This commit adds a BaseSearch class for GridSearch and RandomSearch,
which defines an __init__ that sets all attributes for instances
as well as score and search methods.

This commit also adds RandomSearch, which provides search for optimal
performance on objective function over combinations of hyperparameter
values with specified bounds for specified number of combination
iterations.

This commit includes 'xrange' functionality for RandomSearch to support
Python 2.7

Author: SioKCronin
Email: siobhankcronin@gmail.com
@SioKCronin
Copy link
Collaborator Author

Cool. I've added the __init__.py and the tests pass. I'll start putting together an Example for these two search methods.

@ljvmiranda921 ljvmiranda921 merged commit 584f246 into ljvmiranda921:master Sep 13, 2017
@ljvmiranda921
Copy link
Owner

Great, @SioKCronin! I have already merged your branch. Thanks a lot!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
enhancement Feature requests
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants